Three-model analysis report

Generated by scripts/generate_three_model_report.py

Summary

model agg_key analyzed total missing purity_true purity_false llm_true llm_false agreement_true_total agreement_true_agree agreement_false_total agreement_false_agree
deepseek deepseek-r1_8b_floss_hashes_no_rpt_purity_with_analysis 5841 5841 0 0 2425 210 4492 0 0 2425 1871
mistral mistral_latest_floss_hashes_no_rpt_purity_with_analysis 5841 5841 0 0 2425 340 5052 0 0 2425 2157
gemma gemma2_2b_floss_hashes_no_rpt_purity_with_analysis 5841 5841 0 0 2425 1084 4736 0 0 2425 1986

Coverage & Predictions

Missing / Failed commits breakdown

deepseek — failed_count: 1139

purity_analysiscount
NONE610
FALSE529

mistral — failed_count: 449

purity_analysiscount
NONE244
FALSE205

gemma — failed_count: 21

purity_analysiscount
NONE17
FALSE4